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Abstract 

Sharing, an abstract domain developed by D. Jacobs and A. Langen for the analysis of 
logic programs, derives useful aliasing information. It is well-known that a commonly 
used core of techniques, such as the integration of Sharing with freeness and linearity 
information, can significantly improve the precision of the analysis. However, a number 
of other proposals for refined domain combinations have been circulating for years. One 
feature that is common to these proposals is that they do not seem to have undergone 
a thorough experimental evaluation even with respect to the expected precision gains. In 
this paper we experimentally evaluate: helping Sharing with the definitely ground variables 
found using Pos, the domain of positive Boolean formulas; the incorporation of explicit 
structural information; a full implementation of the reduced product of Sharing and Pos; 
the issue of reordering the bindings in the computation of the abstract mgu; an original 
proposal for the addition of a new mode recording the set of variables that are deemed to 
be ground or free; a refined way of using linearity to improve the analysis; the recovery 
of hidden information in the combination of Sharing with freeness information. Finally, 
we discuss the issue of whether tracking compoundness allows the computation of more 
sharing information. 

KEYWORDS: Abstract Interpretation; Logic Programming; Sharing Analysis; Experi- 
mental Evaluation. 



1 Introduction 

In the execution of a logic program, two variables are aliased or share at some 
program point if they are bound to terms that have a common variable. Conversely, 
two variables are independent if they are bound to terms that have no variables in 
common. Thus by providing information about possible variable aliasing, we also 
provide information about definite variable independence. In logic programming, 

* The work of the first and second authors has been partly supported by MURST projects "Cer- 
tificazione automatica di programmi mediante interpretazione astratta" and "Interpretazione 
astratta, sistemi di tipo e analisi control-flow." 

t This work was partly supported by EPSRC under grant GR/M05645. 
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a knowledge of the possible aliasing (and hence definite independence) between 
variables has some important applications. 

Information about variable aliasing is essential for the efficient exploitation of 
AND-parallefism (Bueno et al. 1994: B ueno et al. 1999 Chan g et al. 1985| |Hcrmencgildo and Greene 1990| 
[Hermenegildo and Rossi 1995i,,Jacobs and Langen 1992,,Muthukumar and Hermenegildo 1992) ). 
Informally, two atoms in a goal are executed in parallel if, by a mixture of compile- 
time and run-time checks, it can be guaranteed that they do not share any variable. 
This implies the absence of binding conflicts at run-time: it will never happen that 
the processes associated to the two atoms try to bind the same variable. 

Another significant application is occurs- check reduction ( |Crnogorac et al. 1996| 
|S0ndergaard 1986| ). It is well-known that many implemented logic programming 
languages (e.g., almost all Prolog systems) omit the occurs-check from the unifi- 
cation procedure. Occurs-check reduction amounts to identifying the unifications 
where such an omission is safe, and, for this purpose, information on the possible 
aliasing of program variables is crucial. 

Aliasing information can also be used indirectly in the computation of other 
interesting program properties. For instance, the precision with which freeness in- 
formation can be computed depends, in a critical way, on the precision with which 
aliasing can be tracked fBruynooghe et al. 1994a' 'Codish et al. 1993', ' File 19941 
[King and Soper 1994| [Langen 1990 Muthukumar and Hermenegildo 199l| . 

In addition to these well-known applications, a recent line of research has shown 
that aliasing information can be exploited in Inductive Logic Programming (ILP). 
Several optimizations have been proposed for speeding up the refinement of induc- 
tively defined predicates in ILP systems IjBlockeel et al. 2000llSantos Costa et al. 2000|l . 
It has been observed that the applicability of some of these optimizations, formu- 
lated in terms of syntactic conditions on the considered predicate, could be recast 
as tests on variable aliasing IjBlockeel et al. 2000i Appendix D). 

Sharing, a domain due to D. Jacobs and A. Langen ( [Jacobs and Langen 1989| 
[Jacobs and Langen 1992j [Langen 1990[ ), is based on the concept of set-sharing. 
An element of the Sharing domain, which is a set of sharing-groups (i.e., a set 
of sets of variables), represents information on groundness,^ groundness dependen- 
cies, possible aliasing, and more complex sharing- dependencies among the vari- 
ables that are involved in the execution of a logic program ( [Bagnara et al. 1997[ 
[Bagnara et"al. 2002..&ieno et al. 19941 IBueno et al. 19991 . 

Even though Sharing is quite precise, it is well-known that more precision is at- 
tainable by combining it with other domains. Nowadays, nobody would seriously 
consider performing sharing analysis without exploiting the combination of aliasing 
information with groundness and linearity information. As a consequence, expres- 
sions such as 'sharing information', 'sharing domain' and 'sharing analysis' usually 
capture groundness, aliasing, linearity and quite often also freeness. Notice that 
this idiom is nothing more than a historical accident: as we will see in the sequel, 

^ A variable is ground if it is bound to a term containing no variables, it is compound if it is 
bound to a non-variable term, it is free if it is not compound, it is linear if it is bound to a 
term that does not contain multiple occurrences of a variable. 
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compoundness and other kinds of structural information could also be included in 
the collective term 'sharing information'. 

As argued informally by H. S0ndergaard ( |S0ndergaard 1986| ), linearity infor- 
mation can be suitably exploited to improve the accuracy of a sharing analysis. 
This observation has been formally applied in IjCodish et al. 199T)l to the speci- 
fication of the abstract mgu operator for ASub, a sharing domain based on the 
concept of pair-sharing (i.e., aliasing and linearity information is encoded by a set 
of pairs of variables). A similar integration with linearity for the domain Sharing 
was proposed by Langen in his PhD thesis dLangen 1990J . The synergy attainable 
from the integration between aliasing and freeness information was pointed out 
by K. Muthukumar and M. Hermenegildo ( |Muth ukumar and Hermcncgild o 1992| ). 
Building on these works, W. Hans and S. Winkler (,Hans and Winkler 1992| pro- 
posed a combined integration of freeness and linearity information with sharing, 
but small variations (such as the one we will present as the starting point for our 
work) have been developed by M. Bruynooghe et al. ( |Bruynooghe and Codish 1993| 
IBruynooghe et al. 1994aD . 

There have been a number of other proposals for more refined combinations 
which have the potential for improving the precision of the sharing analysis over 
and above that obtainable using the classical combinations of Sharing with linearity 
and freeness. These include the implementation of more powerful abstract seman- 
tic operators (since it is well-known that the commonly used ones are sub-optimal) 
and/or the integration with other domains. Not one of these proposals seem to have 
undergone a thorough experimental evaluation, even with respect to the expected 
precision gains. The goal of this paper is to systematically study these enhance- 
ments and provide a uniform theoretical presentation together with an extensive 
experimental evaluation that will give a strong indication of their impact on the 
accuracy of the sharing information. 

Our investigation is primarily from the point of view of precision. Reasonable 
efficiency is also clearly of interest but this has to be secondary to the question 
as to whether precision is significantly improved: only if this is established, should 
better implementations be researched. One of the investigated enhancements is 
the integration of explicit structural information in the sharing analysis and an 
important contribution of this paper is that it shows both the feasibility and the 
positive impact of this combination. 

Note that, regardless of its practicality, any feasible sharing analysis technique 
that offers good precision may be valuable. While inefficiency may prevent its adop- 
tion in production analyzers, it can help in assessing the precision of the more 
competitive techniques. 

The present paper, which is an improved and extended version of ga gnara et al. 20001 ), 
is structured as follows. In Sectional we define some notation and recall the defini- 
tions of the domain Sharing and its standard integration with freeness and linearity 
information denoted as SFL. In Sectional we briefly describe the China analyzer, 
the benchmark suite and the methodology we follow in the experimental evalua- 
tions. In each of the next seven sections, we describe and experimentally evaluate 
different enhancements and precision optimizations for the domain SFL. Sectional 
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considers a simple combination of Pos with SFL; Section |5l investigates the effect of 
including explicit structural information by means of the Pattern(-) construction; 
Sectional discusses possible heuristics for reordering the bindings so as to maximize 
the precision of SFL; Section [T] studies the implementation of a more precise com- 
bination between Pos and SFL; Section |21 describes a new mode 'ground or free' to 
be included in SFL; Section|Sland Section [TUI study the possibility of improving the 
exploitation of the linearity and freeness information already encoded in SFL. In 
Section [TTI wc discuss (without an experimental evaluation) whether compoundness 
information can be useful for precision gains. Section [T21 concludes with some final 
remarks. 

2 Preliminaries 

For any set S, p{S) denotes the powerset of S. For ease of presentation, we assume 
there is a finite set of variables of interest denoted by VI. If t is a syntactic object 
then vars(t) and mvars{t) denote the set and the multiset of variables in i, respec- 
tively. If a occurs more than once in a multiset M we write a € M . We let Terms 
denote the set of first-order terms over VI . Bind denotes the set of equations of 
the form x = t where x & VI and t G Terms is distinct from x. Note that we 
do not impose the occurs-check condition x ^ vars{t), since we target the analysis 
of Prolog and CLP systems possibly omitting this check. The following simplifi- 
cation of the standard definitions for the Sharing domain IjCortesi and File 19991 
IHill et al. 19981 [Jacobs and Langen 1992| ) assumes that the set of variables of in- 
terest is always given by VI. ^ 

Definition 1 

(The set-sharing domain SH .) The set SH is defined by 

SH =^ p{SG), 
where the set of sharing- groups SG is given by 

SG":^ p{VI)\{0}. 

SH is ordered by subset inclusion. Thus the lub and gib of the domain are set union 
and intersection, respectively. 

Definition 2 

(Abstract operations over SH.) The abstract existential quantification on SH 
causes an element of SH to "forget everything" about a subset of the variables of 
interest. It is encoded by the binary function aexists : SH x p( VI) SH such that, 

^ Note that, during the analysis process, the set of variables of interest may expand (when solving 
the body of a clause) and contract (when abstract descriptions are projected onto the variables 
occurring in the head of a clause). However, at any given time the set of variables of interest is 
fixed. By consistently denoting this set by VI , we simplify the presentation, since we can omit 
the set of variables of interest to which an abstract description refers. 
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for each sh £ SH and V e p{ VI), 

aexists(s/i, V) = {S\V\S€sh,S\Vj^0}u{{x}\xeV}. 

For each sh G SH and each V G p{VI), the extraction of the relevant component 
of sh with respect to V is given by the function rel: p( VI) x SH — » SH defined as 

rel(V, sh)=^ {SGsh\SnV^0}. 

For each sh e SH and each V G p(V7), the function rel: p{VI) x S'i/' — *■ SH 
gives the irrelevant component of sh with respect to V. It is defined as 

kI{V, sh) =^ sh \ rel(V, sh). 

The function (■)* : SH SH, also called star-union, is given, for each sh G SH, 
by 

= IsgSG 3n > 1 . 3Ti, . . . ,Tn G sh . S = [JtA . 

i=i ^ 

For each shi, sh2 G SH, the function bin: S'ff x SH — > 5^?, called binary union, 
is given by 

bin(s/ii, s/12) =^ { Si U I 51 G 5/11,52 G s/12 }• 
We also use the self-bin-union function sbin: SH SH, which is given, for each 

sh G SH, by 

sbin(s/i) hm{sh, sh). 

The function amgu: SH x Bind — » SH captures the effect of a binding on an 
element of SH. Assume {x = t) G Bind, sh G SH, = {x}, Vt = vars{t), and 
Voot = V^UVt.Then 

amgu{sh,x = t) Tel{Vxt, sh) U hva[Tel{Vx, sh)* ,Tel{Vt, sh)*) . (1) 

We now briefly recall the standard integration of set-sharing with freeness and 
linearity information. These properties are each represented by a set of variables, 
namely those variables that are bound to terms that definitely enjoy the given 
property. These sets are partially ordered by reverse subset inclusion so that the 
lub and gib operators are given by set intersection and union, respectively. 

Definition 3 

(The domain SFL.) Let F p{VI) and L p{VI) be partially ordered by 
reverse subset inclusion. The domain SFL is defined by the Cartesian product 

SFL =^ SHxFxL 

ordered by the component-wise extension of the orderings defined on the three 
subdomains. 

A complete definition would explicitly deal with the set of variables of interest VI. 
We could even define an equivalence relation on SFL identifying the bottom element 

def 

_L = (0, VI, VI) with all the elements corresponding to an impossible concrete 
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computation state: for example, elements {sh,f,l) G SFL such that / ^ vars{sh) 
(because a free variable does share with itself) or VI \ vars{sh) ^ / (because vari- 
ables that cannot share are also linear). Note however that these and other similar 
spurious elements rarely occur in practice and cannot compromise the correctness 
of the results. 

In a bottom-up abstract interpretation framework, such as the one we focus on, 
abstract unification is the only critical operation. Besides unification, the analysis 
depends on the 'merge-over-all-paths' operator, corresponding to the lub of the 
domain, and the abstract projection operator, which can be defined in terms of an 
abstract existential quantification operator. 



Definition 4 

(Abstract operations over SFL.) The abstract existential quantification on SFL 
is encoded by the binary function aexists: SFL x p( VI) — > SFL such that, for each 
d = {sh, /, I) G SFL and V £ p{ VI), 

aexists(d, V) =' (aexists(s/i, V)JUV,lU V). 

For each d = {sh, f, I) G SFL, we define the following predicates. The predicate 
indd : Terms x Terms Bool expresses definite independence of terms. Two terms 
s,t G Terms are independent in d if and only if indd{s,t) holds, where 

indd{s,t) =^ ^Tel[vars{s), sh) niel[vars{t), sh) = 0^. 

A term t G Terms is free in d if and only if the predicate /ree^ : Terms Bool 
holds for t, that is, 

freedit) =^ {3x G VI . x = t A x € f). 

A term t G Terms is linear in d if and only if lind{t), where lind- Terms — > Bool 
is given by 

lind{t) '= (vars{t) C l) 

A (Vx, y G vars{t) : x = y V indd{x, y)) 

A ^Va; G vars{t) : x G mvars{t) a; ^ vars{sh)^. 

The function amgu : SFL x Bind — > SFL captures the effects of a binding on an 
element of SFL. Let {x = t) € Bind and d = {sh, f, I) G SFL. Let also Vx = {x}, 
Vt = vars{t), Vj;t = VxUVt,Rx = rel(14, sh) and Rt = rel(14, sh). Then 



amgu(rf,a; = f) = {sh',f',l'). 
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where 



sh' =^ rel(T4t, sh) U bin(S'a;, St); 

dot j Rx, if free^{x) M frecj^it) V {lind{t) A indd{x,t)); 
[ R* , otherwise; 

dcf \Rt, li free d{x)y free d{t)y {lind{x) Mndd{x,t)); 
\Rt, otherwise; 

7, 



s. 



J./ dcf ^ 



if /reerf(a;) A/reerf(t); 
f\vars{Rx), iifreCdix); 
f\vars{Rt), itfreCdit); 
^f\vars{RxURt), otherwise; 

r =^ {VI\vars{sh'))uf'Ul"; 

I \ (^vars{Rx) H vars{Rt)), if lind{x) A lind{t); 

I \ vars{Rx), if lind{x); 

l\vars{Rt), if lind{t); 

J\ vars{Rx U Rt), otherwise. 

This specification of the abstract unification operator is equivalent (modulo the 
lack of the explicit structural information provided by abstract equation systems) to 
that given in ( |Bruynooghe et al. 1994at , provided x ^ vars{t). Indeed, as done in all 
the previous papers on the subject, in | |Bruynooghe et al. 1994a| l it is assumed that 
the analyzed language does perform the occurs-check. As a consequence, whenever 
considering a definitely cyclic binding^ that is a binding x — t such that x G vars(t), 
the abstract operator can detect the definite failure of the concrete computation 
and thus return the bottom element of the domain. Such an improvement would not 
be safe in our case, since we also consider languages possibly omitting the occurs- 
check. However, when dealing with definitely cyclic bindings, the specification given 
by the previous definition can still be refined as follows. 



Definition 5 

(Improvement for definitely cyclic bindings.) Consider the specification of the 
abstract operations over SFL given in Definition^ Then, whenever x € vars{t), the 
computation of the new sharing component sh' can be replaced by the following.'^ 

sh' =' Kl{Vxt, sh) U hin{Sx, CSt) , 



^ Note that, in this special case, it also holds that freej{t) = false and indj^{x, t) = (Rx = 0). 
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where 

CS =^ 1'^^*' if^'eerf(x); 
\CRf, otherwise; 

CRt '^^^ rel{vars{t) \ {x}, sh). 

This enhancement, already implemented in the China analyzer, is the rewording 
of a similar one proposed in ( [Bagnara 1997a) ) for the domain Pos in the context 
of groundness analysis. Its net effect is to recover some groundness and sharing 
dependencies that are unnecessarily lost when using the standard operators. 

The domain SH captures set-sharing. However, the property we wish to detect 
is pair-sharing and, for this, it has been shown in ( [Bagnara et al. 2002| ) that SH 
includes unwanted redundancy. The same paper introduces an upper-closure oper- 
ator p on SH and the domain PSD p{SH), which is the weakest abstraction 
of SH that is as precise as SH as far as tracking groundness and pair-sharing is 
concerned.^ A notable advantage of PSD is that we can replace the star-union op- 
eration in the definition of the amgu by self-bin-union without loss of precision. In 
particular, in ( [Bagnara et al. 2002| ) it is shown that 

amgu(,s/i, a; = t) ~p rel(T4t,s/i) U bin^sbin(rel(Vii;, s/i)) , sbin(rel(Vt, s/i))^ , (2) 

where the notation shi =p s/i2 means p{shi) — p(s/i2). 

It is important to observe that the complexity of the amgu operator on SH (Q 
is exponential in the number of sharing-groups of sh. In contrast, the operator on 
PSD ((21) is 0(|s/i|^). Moreover, checking whether a fixpoint has been reached by 
testing shi =p s/12 has complexity 0(|s/iip -f |s/i2p). Practically speaking, very 
often this makes the difference between thrashing and termination of the analysis 
in reasonable time. 

The above observations on SH and PSD can be generalized to apply to the 

dcf 

domain combinations SFL and SFL2 — PSD x F x L.ln particular, SFL2 achieves 
the same precision as SFL for groundness, pair-sharing, freeness and linearity and 
the complexity of the corresponding abstract unification operator is polynomial. 
For this reason, all the experimental work in this paper, with the exception of part 
of the one described in Section [3 has been conducted using the SFL2 domain. 



3 Experimental Evaluation 

Since the main purpose of this paper is to provide an experimental measure of the 
precision gains that might be achieved by enhancing a standard sharing analysis 
with several new techniques we found in the literature, it is clear that the implemen- 
tation of the various domain combinations was a major part of the work. However, 
so as to adapt these assorted proposals into a uniform framework and provide a fair 

* The name PSD, whi ch stands for Pair-Sharing D ependencies, was introduced 
in IZaffanella et al. 19991 . All previous papers, including JBagnara et al. 2002} , denoted 
this domain by SHf. 
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comparison of their results, a large amount of underlying conceptual work was also 
required. For instance, almost all of the proposed enhancements were designed for 
systems that perform the occurs-check and some of them were developed for rather 
different abstract domains: besides changing the representation of the domain ele- 
ments, such a situation usually requires a reconsideration of the specification of the 
abstract operators. 

All the experiments have been conducted using the China analyzer ( |Bagnara 1997a| ) 
on a GNU/Linux PC system equipped with an AMD Athlon clocked at 700 MHz 
and 256 MB of RAM. China is a data-flow analyzer for CLP (Tij^f) languages (i.e., 
ISO Prolog, CLP(7?.), clp(FD) and so forth), Ti^ being an extended Herbrand sys- 
tem where the values of a numeric domain TV can occur as leaves of the terms. 
China, which is written in C++, performs bottom-up analysis deriving information 
on both call-patterns and success-patterns by means of program transformations 
and optimized fixpoint computation techniques. An abstract description is com- 
puted for the call- and success-patterns for each predicate defined in the program 
using a sophisticated chaotic iteration strategy proposed in IjBourdoncle 1998al 

A major point of the experimental evaluation is given by the test-suite, which 
is probably the largest one ever reported in the literature on data-flow analysis of 
(constraint) logic programs. The suite comprises all the programs we have access to 
(i.e., everything we could find by systematically dredging the Internet): more than 
330 programs, 24 MB of code, 800 K lines. Besides classical benchmarks, several real 
programs of respectable size are included, the largest one containing 10063 clauses in 
45658 lines of code. The suite also comprises a few synthetic benchmarks, which are 
artificial programs explicitly constructed to stress the capabilities of the analyzer 
and of its abstract domains with respect to precision and/or efficiency. 

Because of the exponential complexity of the base domain SFL, a data-flow anal- 
ysis that includes this domain will only be practical if it incorporates widening 
operators IjZaffanella et al. 1999|l .^ However, since almost none of the investigated 
combinations come with specialized widening operators, for a fair assessment of 
the precision improvements we decided to disable all the widenings available in 
our SFL implementation. As a consequence, there are a few benchmarks for which 
the analysis does not terminate in reasonable time or absorbs memory beyond ac- 
ceptable limits, so that a precision comparison is not possible. Note however that 
the motivations behind this choice go beyond the simple observation that widening 
operators affect the precision of the analysis: the problem is also that, if we use 
the widenings defined and tuned for our implementation of the domain SFL, the 
results would be biased. In fact, the definition of a good widening for an analysis 
domain normally depends on both the representation and the implementation of 
the domain. In other words, different implementations even of the same domain 

^ China uses the recursive fixpoint iteration strategy on the weak topological ordering defined by 
partitioning of the call graph into strongly-connected subcomponents iBourdonclc 1993b I. 

^ Note that we use the term 'widening operator' in its broadest sense: any mechanism whereby, 
in the course of the analysis, an abstract description is substituted by one that is less precise. 
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will require different tunings of the widening operators (or even, possibly, brand 
new widenings). This means that adopting the same widening operators for all the 
domain combinations would weaken, if not invalidate, any conclusions regarding 
the relative benefits of the investigated enhancements. On the other hand, the defi- 
nition of a new specialized widening operator for each one of the considered domain 
combinations, besides being a formidable task, would also be wasted effort as the 
number of benchmark programs for which termination cannot be obtained within 
reasonable time is really small. 

For space reasons, the experimental results are only summarized here. The inter- 
ested reader can find more information (including a description of the constantly 
growing benchmark suite and detailed results for each benchmark) at the URI 
http://www.cs.unipr.it/China/. Indeed, given the high number of benchmark 
programs and the many domain combinations considered,^ even finding a concise, 
meaningful and practical way to summarize the results has been a non-trivial task. 

For each benchmark, precision is measured by counting the number of indepen- 
dent pairs (the corresponding columns are labeled T' in the tables) as well as the 
numbers of definitely ground (labeled 'G'), free ('F') and linear ('L') variables de- 
tected by each abstract domain. The results obtained for different analyses are 
compared by computing the relative precision improvements or degradations on 
each of these quantities and expressing them using percentages. The "overall" ('O') 
precision improvement for the benchmark is also computed as the maximum im- 
provement on all the measured quantities.^ The benchmark suite is then partitioned 
into several precision equivalence classes: the cardinalities of these classes are ex- 
pressed again using percentages. For example, when looking at the precision results 
reported in TableQlfor goal-dependent analysis, the value 2.3 that can be found at 
the intersection of the row labeled '0 < p < 2' with the column labeled 'G' is to 
be read as follows: "for 2.3 percent of the benchmarks the increase in the number 
of ground variables is less than or equal to 2 percent." The precision class labeled 
'unknown' identifies those benchmarks for which a precision comparison was not 
possible, because one or both of the analyses was timed-out (for all comparisons, 
the time-out threshold is 600 seconds). In summary, a precision table gives an ap- 
proximation of the distribution of the programs in the benchmark suite with respect 
to the obtained precision gains. 

For a rough estimate of the efficiency of the different analyses, for each comparison 
we provide two tables that summarize the times taken by the fixpoint computa- 
tions. It should be stressed that these by no means provide a faithful account of the 

We compute the results of 40 different variations of the static analysis, which are then used 
to perform 36 comparisons. The results are computed over 332 programs for goal-independent 
analyses and over 221 programs for goal-dependent analyses. This difference in the number of 
benchmarks considered comes from the fact that many programs either are not provided with 
a set of entry goals or use constructs such as call (G) where G is a term whose principal functor 
is not known. In these cases the analyzer recognizes that goal-dependent analysis is pointless, 
since no call-patterns can be excluded. 
* When computing this "overall" result for a benchmark, the presence of even a single preci- 
sion loss for one of the measures overrides any precision improvement computed on the other 
components. 
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intrinsic computational cost of the tested domain combinations. Besides the lack 
of widcnings, which have a big impact on performance as can be observed by the 
results reported in IjZaffanella et al. 1999|l . the reader should not forget that, for 
ease of implementation, having targeted at precision we traded efficiency whenever 
possible. Therefore, these tables provide, so to speak, upper-bounds: refined imple- 
mentations can be expected to perform at least as well as those reported in the 
tables. 

As done for the precision results, the timings are summarized by partitioning 
the suite into equivalence classes and reporting the cardinality of each class using 
percentages. In the first table we consider the distribution of the absolute time 
differences, that is we measure the slow-down and speed-up due to the incorporation 
of the considered enhancement. Note that the class called 'same time' actually 
comprises the benchmarks having a time difference below a given threshold, which 
is fixed at 0.1 seconds. In the second table we show the distribution of the total 
fixpoint computation times, both for the base analysis (in the columns labeled '%1') 
and for the enhanced one (in the columns labeled '%2'); the columns labeled 'A' 
show how much each total time class grows or shrinks due to the inclusion of the 
considered combination. 



4 A Simple Combination with Pes 

It is well-known that the domain Sharing (and thus also SFL) keeps track of ground 
dependencies. More precisely. Sharing contains Def, the domain of definite Boolean 
functions ( [Armstrong et al. 1998| ), as a proper subdomain l|Cortesi et al. 199'2llZaffanella et al. 1999|l . 
However, we consider here the combination of SFL with Pos, the domain of posi- 
tive Boolean functions ( [Armstrong et al. 1998| ). There are several good reasons to 
couple SFL with Pos: 

1. Pos is strictly more expressive than Def in that it can represent (positive) dis- 
junctive groundness dependencies that arise in the analysis of Prolog programs 

rmstrong et al. 19'98| ). The ability to deal with disjunctive dependencies is 
also needed for the precise approximation of the constraints of some CLP lan- 
guages: for example, when using the finite domain solver of SICStus Prolog, 
the user can write disjunctive constraints such as 'X #= 4 #\/ Y #= 6'. 

2. The increased precision on groundness propagates to the SFL component. It 
can be exploited to remove redundant sharing groups and to identify more 
linear variables, therefore having a positive impact on the computation of the 
amgu operator of the SFL domain. Moreover, when dealing with sequences 
of bindings, the added groundness information allows them to be usefully 
reordered. In fact, while it has been proved that Sharing alone is commutative, 
meaning that the result of the analysis does not depend on the ordering in 
which the bindings are executed ([Hill et al. 1998(1 . the domain SFL does not 
enjoy this property. In particular, even for the simpler combination of Sharing 
with linearity it is known since ( [Langen 1990[ pp. 66-67) that better results 



12 



R. Bagnara, E. Zaffanella, and P. M. Hill 



are obtained if the grounding bindings are considered before the others.^ As 
an example, consider the sequences of unifications (^f{X,X,Y) = A, X = aj 
and {X = a, f{X,X,Y) = A) ( |Langen 1990| p. 66). The combination with 
Pos is clearly advantageous in this respect. 

3. Besides being useful for improving precision on other properties, disjunctive 
dependencies also have a few direct applications, such as occurs-check reduc- 
tion. As observed in ( |Crnogorac et al. 1996| ), if the groundness formula x\/ y 
holds, the unification x = y is occurs-check free, even when neither x nor y 
are definitely linear. 

4. Detecting the set of definitely ground variables through Pos and exploit- 
ing it to simplify the operations on SFL can improve the efficiency of the 
analysis. In particular this is true if the set of ground variables is readily 
available, as is the case, for instance, with the GER implementation of Pos 
Joagnara and Schachte 1999| ). 

5. The combination with Pos is essential for the application of a powerful widen- 
ing technique on SFL HZaffanella et al. 1999|l . This is very important, since 
analysis based on SFL is not practical without widenings. 

6. In the context of the analysis of CLP programs, the notions of "ground vari- 
able" and the notion of "variable that cannot share a common variable with 
other variables" are distinct. A numeric variable in, say, CLP (7^), cannot 
share with other numerical variables (not in the sense of interest in this pa- 
per) but is not ground unless it has been constrained to a unique value. Thus 
the analysis of CLP programs with SFL alone either will lose precision on 
pair-sharing (if arithmetic constraints are abstracted into "sharings" among 
numeric variables in order to approximate the groundness of the latter) or 
will be imprecise on the groundness of numeric variables (because only Her- 
brand constraints take part in the construction of sharing-sets). In the first 
alternative, as we have already noted, the precision with which groundness 
of numeric variables can be tracked will also be limited. Since groundness of 
numeric variables is important for a number of applications (e.g., compiling 
equality constraints down to assignments or tests in some circumstances), we 
advocate the use of Pos and SFL at the same time. 

Thus, as a first technique to enhance the precision of sharing analysis, we con- 
sider the simple propagation of the set of definitely ground variables from the Pos 
component to the SFL component. We denote this domain by Pos x SFL. 

As noted above, the GER implementation of l |Bagnara and Schachte 1999| ), be- 
sides being the fastest implementation of Pos known to date, is the natural can- 
didate for this combination, since it provides constant-time access to the set G of 
the definitely ground variables. Note that the widenings on the Pos component 



A binding x = t is grounding with respect to an abstract description if, in all the concrete 
computation states approximated by the abstract description, either the variable x is ground 
or all the variables in t are ground. For example, when considering an abstract description 
sh g SH , the binding x = t is grounding if rel({x}, sh) = or rel(vars{t), sh) = 0. 
A more precise combination will be considered in Section ItI 
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Prec. class 


Goal Independent 


Goal Dependent 




O 


I 


G 


F 


L 





I 


G 


F 


L 


5 < p < 10 












0.5 




0.5 






2 <p < 5 


0.3 




0.3 
















<p < 2 


0.6 


0.6 


0.6 




0.6 


3.2 


3.6 


2.3 




2.7 


same precision 


95.8 


96.1 


95.8 


96.7 


96.1 


92.8 


92.8 


93.7 


96.4 


93.7 


unknown 


3.3 


3.3 


3.3 


3.3 


3.3 


3.6 


3.6 


3.6 


3.6 


3.6 



Time difference class 


% benchmarks 




Goal Ind. 


Goal Dep. 


degradation > 1 


2.7 


6.8 


0.5 < degradation < 1 


1.5 


0.5 


0.2 < degradation < 0.5 


3.0 


0.9 


0.1 < degradation < 0.2 


5.7 


5.0 


both timed out 


3.3 


3.6 


same time 


81.6 


81.9 


0.1 < improvement < 0.2 




0.5 


0.2 < improvement < 0.5 


0.9 


0.5 


0.5 < improvement < 1 


0.3 




improvement > 1 


0.9 


0.5 



Total time class 


Goal Ind. 


Goal Dep. 




%1 


%2 


A 


%1 


%2 


A 


timed out 


3.3 


3.3 




3.6 


3.6 




t > 10 


8.4 


9.0 


0.6 


7.2 


7.2 




5 < t < 10 


0.6 


0.3 


-0.3 


1.4 


1.4 




1< t < 5 


6.6 


7.5 


0.9 


3.2 


3.6 


0.5 


0.5 < t < 1 


3.3 


2.7 


-0.6 


5.4 


5.4 




0.2 < t < 0.5 


7.2 


8.4 


1.2 


10.4 


13.1 


2.7 


t < 0.2 


70.5 


68.7 


-1.8 


68.8 


65.6 


-3.2 



Table 1. SFL2 versus Pas x SFL2. 

have been retained. The reason for this choice is that they fire for only a few bench- 
marks and, when coming into play, they rarely affect the precision of the groundness 
analysis: by switching them off we would only obtain a few more time-outs. 
In the SFL component, the set G of definitely ground variables is used 

• to reorder the sequence of bindings in the abstract unification so as to handle 
the grounding ones first; 

• to eliminate the sharing groups containing at least one ground variable; and 

• to recover from previous linearity losses. 

The experimental results for Pos x SFL are compared with those obtained for the 
domain SFL considered in isolation and reported in Tabled It can be observed that 
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a precision improvement is observed in all of the measured quantities but freeness, 
affecting up to 3.6% of the programs. 

Note that there is a small discrepancy between these results and those of ( [Bagnara et al. 2000| ) 
where more improvements were reported. The reason is that the current SFL imple- 
mentation uses an enhanced abstract unification operator, fully exploiting the antic- 
ipation of the grounding bindings even on the base domain SFL itself. In contrast, 
in the earlier SFL implementation used for the results in ( [Bagnara et al. 2000| ), 
only the syntactically grounding bindings were anticipated.^^ 

As for the timings, even if the figures in the tables seem to contradict what we 
claimed in point^labove, a closer inspection of the detailed results reveals that this 
is only due to a very unfortunate interaction between the increased precision given 
by Pos and the absence of widening operators on SFL. This state of affairs forces 
the analyzer to compute a few, but very expensive, further iterations in the fixpoint 
computation. 

Because of the reasons detailed above, we believe Pos should be part of the global 
domain employed by any "production analyzer" for CLP languages. That is why, 
for the remaining comparisons, unless otherwise stated, this simple combination 
with the Pos domain is always included. 

5 Tracking Explicit Structural Information 

A way of increasing the precision of almost any analysis domain is by enhanc- 
ing it with structural information. For mode analysis, this idea dates back to 
( [Janssens and Bruynooghe 1992| ). A more general technique was proposed in (jCortesi et al. 1994)l . 
where the generic structural domain Pat(5R) was introduced. A similar proposal, 
tailored to sharing analysis, is due to ( [Bruynooghe et al. 1994a| ), where abstract 
equation systems are considered. In the experimental evaluation the Pattern(-) 
construction ( [Bagnara 1997a[ [Bagnara 1997b| [Bagnara et al. 2000| ) is used. This 
is similar to Pat (3?) and correctly supports the analysis of languages omitting the 
occurs-check in the unification procedure as well as those that do not. 

The construction Pattcrn(-) upgrades a domain V (which must support a certain 
set of basic operations) with structural information. The resulting domain, where 
structural information is retained to some extent, is usually much more precise 
than T) alone. There are many occasions where these precision gains give rise to 
consistent speed-ups. The reason for this is twofold. First, structural information 
has the potential of pruning some computation paths on the grounds that they 
cannot be followed by the program being analyzed. Second, maintaining a tuple 
of terms with many variables, each with its own description, can be cheaper than 
computing a description for the whole tuple ( [Bagnara et al. 20001 ). Of course, there 
is also a price to be paid: in the analysis based on Pattern(2?), the elements of 

A binding x = t is syntactically grounding if vars(t) = 0. This "syntactic" definition differs 
from the "semantic" one provided before in that it does not depend on the information provided 
by an abstract description. 
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V that are to be manipulated are often bigger (i.e., there are more variables of 
interest) than those that arise in analyses that are simply based on P. 

When comparing the precision results, the difference in the number of variables 
tracked by the two analyses poses a non-trivial problem. How can we provide a 
fair measure of the precision gain? There is no easy answer to such a question. 
The approach chosen is simple though unsatisfactory: at the end of the analysis, 
first throw away all the structural information in the results and then calculate 
the cardinality of the usual sets. In other words, we only measure how the explicit 
structural information in Pattern(I?) improves the precision on V itself, which is 
only a tiny part of the real gain in accuracy. As shown by the following example, 
this solution greatly underestimates the precision improvement coming from the 
integration of structural information. 

Consider a simple but not trivial Prolog program: mastermind. Consider also 
the only direct query for which it has been written, '?- play.', and focus the 
attention on the procedure extend_code/l. A standard goal-dependent analysis of 
the program with the Pos x SFL domain cannot say anything on the successes of 
extend_code/l. If the analysis is performed with Pattern(Pos x SFL) the situation 
changes radically. Here is what such a domain allows China to derive: 

extend_code( [( [AlB] ,C,D) IE] ) :- 
list(B), list(E), 
(f unctor (C , _ , 1) ; integer (C) ) , 
(f unctor (D , _ , 1) ; integer (D) ) , 
ground ( [C , D] ) , may_shar e ( [ [A , B , E] ] ) . 

This means: "during any execution of the program, whenever extend_code/l suc- 
ceeds it will have its argument bound to a term of the form [([A|B] ,C,D) |E], 
where B and E are bound to list cells (i.e., to terms whose principal functor is either 
' . ' /2 or []/0); C and D are ground and bound to a functor of arity 1 or to an 
integer; and pair-sharing may only occur among A, B, and E". Once structural in- 
formation has been discarded, the analysis with Pattern(Pos x SFL) only specifies 
that extend_code/ 1 may succeed. Thus, according to our approach to the precision 
comparison, explicit structural information gives no improvements in the analysis 
of extend_code/l (which is far from being a fair conclusion). 

Of course, structural information is very valuable in itself. For example, when 
exploited for optimized compilation it allows for enhanced clause indexing and sim- 
plified unification. Several other semantics-based program manipulation techniques 
(such as debugging, program specialization, and verification) benefit from this kind 
of information. However, the value of this extra precision could only be measured 
from the point of view of the target application of the analysis. 



This program which implements the game "Mastermind" was rewritten by 
H. Koenig and T. Hoppc after code by M. H. van Emden and available at 
http: //www. cs .unipr . it/Chlna/Benclunarks/Prolog/inasterniind. pi 

Some extra groundness information obtained by the analysis has been omitted for simplicity: 
this says that, if A and B turn out to be ground, then E will also be ground. 
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Prec. class 


Goal Independent 


Goal Dependent 




O 


I 


G 


F 


L 





I 


G 


F 


L 


p > 20 


7.5 


2.7 


3.9 


2.1 


3.3 


6.3 


1.4 


3.6 


1.8 


3.6 


10 < p < 20 


3.9 


2.1 


2.7 




2.4 


2.7 


2.3 


1.4 




2.7 


5 <p < 10 


4.5 


1.8 


2.7 


2.4 


2.4 


1.8 


0.9 


2.3 


0.9 


1.4 


2 <p < 5 


7.5 


6.0 


3.9 


2.7 


5.1 


2.7 


3.2 


1.4 


1.8 


2.3 


<p < 2 


7.8 


9.0 


6.6 


6.9 


12.0 


2.3 


4.5 


1.8 


1.8 


5.0 


same precision 


61.7 


71.7 


73.5 


79.2 


67.8 


74.2 


78.3 


80.1 


84.2 


75.1 


unknown 


6.6 


6.6 


6.6 


6.6 


6.6 


9.5 


9.5 


9.5 


9.5 


9.5 


p < 


0.3 








0.3 


0.5 








0.5 



Time diff. class 


% benchmarks 




Goal Ind. 


Goal Dep. 


degradation > 1 


11.7 


17.6 


0.5 < degradation < 1 


1.2 


0.9 


0.2 < degradation < 0.5 


3.6 


4.1 


0.1 < degradation < 0.2 


1.5 


4.1 


both timed out 


3.3 


3.6 


same time 


70.8 


66.5 


0.1 < improvement < 0.2 


0.9 


0.5 


0.2 < improvement < 0.5 


1.5 




0.5 < improvement < 1 


0.6 


0.5 


improvement > 1 


4.8 


2.3 



Total time class 


Goal Ind. 


Goal Dep. 




%1 


%2 


A 


%1 


%2 


A 


timed out 


3.3 


6.6 


3.3 


3.6 


9.5 


5.9 


t > 10 


9.0 


8.4 


-0.6 


7.2 


8.6 


1.4 


5 < t < 10 


0.3 


1.5 


1.2 


1.4 


1.8 


0.5 


1< t < 5 


7.5 


6.6 


-0.9 


3.6 


5.0 


1.4 


0.5 < t < 1 


2.7 


3.3 


0.6 


5.4 


3.2 


-2.3 


0.2 < t < 0.5 


8.4 


10.2 


1.8 


13.1 


13.6 


0.5 


t < 0.2 


68.7 


63.3 


-5.4 


65.6 


58.4 


-7.2 



Table 2. Pos x SFL2 versus Pattern(/'os x SFL2). 

Thus the precision of the domain Pos x SFL has been compared with that ob- 
tained using the domain Pattern(Pos x SFL) and the results reported in Tabled 
It can be seen that, for goal-independent analysis, on one third of the benchmarks 
compared there is a precision improvement in at least one of the measured quanti- 
ties; the same happens for one sixth of the benchmarks in the case of goal-dependent 
analysis. Moreover, the increase in precision can be considerable, as testified by the 
percentages of benchmarks falling in the higher precision classes. 

The reader may be surprised, as the authors were, to see that in some cases the 
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precision actually decreased. Indeed, to the best of our knowledge, this possibility 
has escaped all previous research work investigating this kind of abstract domain en- 
hancement, including IjCortesi et al. 1994l|Bruynooghe et al. 1994a||Bagnara 1997a| ). 
The reason for these precision losses lies in a subtle interaction between the explicit 
structural information and the underlying abstract unification operator. 

When using the base domain Pos x SFL, the abstract evaluation of a single 
syntactic binding, such as a; = f{y, z), directly corresponds to a single application of 
the amgu operator. In contrast, when computing on Pattern(_Posx SFL), it may well 
happen that the computed abstract description already contains the information 
that variable x is bound to a term, such as f {^g{w),uj) . As a consequence, after 
peeling the principal functor //2, the abstract computation should proceed by 
evaluating, on the base domain Pos x SFL, the set of bindings {y = g{w), z = w}. 
Here the problem is that, as already noted, the amgu operator on the base domain 
Pos X SFL is not commutative. While this improvement in the data used by the 
abstract computation very often allows for a corresponding increase in the precision 
of the result, in rare situations it may happen that a sub-optimal ordering of the 
bindings is chosen, incurring a precision loss. 

It should be noted that such a negative interaction with the explicit struc- 
tural information is only possible when the underlying domain implements non- 
commutative abstract operators. In particular, this phenomenon could not be ob- 
served when computing on Pattern(;S-ff ) or Pattern(i-'os). 

One issue that should be resolved is whether the improvements provided by ex- 
plicit structural information subsume those previously obtained for the simple com- 
bination with Pos. Intuitively, it would seem that this cannot happen, since these 
two enhancements are based on different kinds of information: while the Pattern(-) 
construction encodes some definite structural information, the precision gain due 
to using Pos rather than just Def only stems from disjunctive groundness depen- 
dencies. However, the impact of these techniques on the overall analysis is really 
intricate and some overlapping cannot be excluded a priori: for instance, both tech- 
niques affect the ordering of bindings in the computation of abstract unification on 
SFL. In order to provide some experimental evidence for this qualitative reasoning, 
the precision results are computed for the simpler domain Pattern(5'FL) and then 
compared with those obtained for the domain Pattern(Fos x SFL). Since the main 
differences between Tables ^ and |31 can be explained by discrepancies in the num- 
bers of programs that timed-out, these results confirm our expectations that these 
two enhancements are effectively orthogonal. 

Similar experimental evaluations, but based on the abstract equation systems of 
( |Bruynooghe et al. 1994a| ), were reported by A. Mulkers et al. in f;Mulkers et al. 19941 
IMulkers et al. 1995|l . Here a depth-fc abstraction (replacing all subterms occurring 
at a depth greater than or equal to k with fresh abstract variables) is conducted 
on a small benchmark suite (19 programs) for values of k between and 3. The 
domain they employed was not suitable for the analysis of real programs and, in 



This happens for the program attractions2 in the case of goal-independent analysis and for 
the program semi in the case of goal-dependent analysis. 
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Prec. class 


Goal Independent 


Goal Dependent 




O 


I 


G 


F 


L 





I 


G 


F 


L 


5 < p < 10 












0.5 




0.5 






2 <p < 5 


0.3 




0.3 








0.5 








<p < 2 












3.2 


3.2 


2.7 




2.7 


same precision 


93.1 


93.4 


93.1 


93.4 


93.4 


86.4 


86.4 


86.9 


90.0 


87.3 


unknown 


6.6 


6.6 


6.6 


6.6 


6.6 


10.0 


10.0 


10.0 


10.0 


10.0 



Time diff. class 


% benchmarks 




Goal Ind. 


Goal Dep. 


degradation > 1 


5.7 


7.7 


0.5 < degradation < 1 


2.4 


0.5 


0.2 < degradation < 0.5 


3.6 


5.4 


0.1 < degradation < 0.2 


5.4 


2.7 


both timed out 


6.6 


9.5 


same time 


75.6 


73.8 


0.1 < improvement < 0.2 






0.2 < improvement < 0.5 


0.6 




0.5 < improvement < 1 






improvement > 1 




0.5 



Total time class 


Goal Ind. 


Goal Dep. 




%1 


%2 


A 


%1 


%2 


A 


timed out 


6.6 


6.6 




10.0 


9.5 


-0.5 


t > 10 


8.1 


8.4 


0.3 


7.7 


8.6 


0.9 


5 < t < 10 


1.5 


1.5 




2.3 


1.8 


-0.5 


1< t < 5 


5.1 


6.6 


1.5 


4.5 


5.0 


0.5 


0.5 < t < 1 


3.9 


3.3 


-0.6 


3.2 


3.2 




0.2 < t < 0.5 


7.2 


10.2 


3.0 


10.9 


13.6 


2.7 


t < 0.2 


67.5 


63.3 


-4.2 


61.5 


58.4 


-3.2 



Table 3. Pattern(5'FL2) versus Pattern(f os x SFL2). 

fact, even the analysis of a modest-sized program like ann could only be carried out 
with depth-0 abstraction (i.e., without any structural information). Such a problem 
in finding practical analyzers that incorporated structural information with sharing 
analysis was not unique to this work: there was at least one other previous attempt 
to evaluate the impact of structural information on sharing analysis that failed 
because of combinatorial explosion [A. Cortesi, personal communication, 1996]. 

What makes the more realistic experimentation now possible is the adoption 
of the non-redundant domain PSD , where the exponential star- union operation is 
replaced by the quadratic self-bin-union. Note that, even if biased by the absence 
of widenings, the timings reported in TableElshow that the Pattern(-) construction 
is computationally feasible. Indeed, as demonstrated by the results reported in 
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gnara et al. 2000| ), an analyzer that incorporates a carefully designed structural 
information component, besides being more precise, can also be very efficient. 

The results obtained in this section demonstrate that there is a relevant amount 
of sharing information that is not detected when using the classical set-sharing 
domains. Therefore, in order to provide an experimental evaluation that is as sys- 
tematic as possible, in all of the remaining experiments the comparison is performed 
both with and without explicit structural information. 

6 Reordering the Non-Grounding Bindings 

As already explained in Section 0] the results of abstract unification on SFL may 
depend on the order in which the bindings are considered and will be improved if 
the grounding bindings are considered first. This heuristic, which has been used for 
all the experiments in this paper, is well-known: in the literature all the examples 
that illustrate the non-commutativity of the abstract mgu on SFL use a grounding 
binding. However, as observed in Section|Sl the problem is more general than that. 

To illustrate this, suppose that VI = {u,v,w,x,y, z} is the set of relevant vari- 
ables, and consider the SFL element^^ 

d =^ {{vy,wy,xy,yz},0,{u,x,z}), 

where no variable is free and u, x, and z are linear with the bindings v = w and 
X = y. Then, applying amgu to these bindings in the given ordering, we have: 

di = amgu((i, v ^ w) 

= {{vwy, xy, yz}, 0, {u, x, z}), 
(ii_2 = amgu(rfi, a; = y) 

= {^{vwxy, vwxyz, xy, xyz}, 0, {u, z}). 

Using the reverse ordering, we have: 

^2 — amgu((i, X = y) 

= (^{vwxy, vwxyz, vxy, vxyz, wxy, wxyz, xy, xyz}, 0, {u, z}), 
^2,1 = amgu(rf2, w = w) 

= (^{vwxy, vwxyz, xy, xyz}, 0, {m}). 

Thus ^2,1 loses the linearity of z (which, in turn, could cause bigger precision losses 
later in the analysis). 

In principle, optimality can be obtained by adopting the brute-force approach: 
trying all the possible orderings of the non-grounding bindings. However, this is 
clearly not feasible. While lacking a better alternative, it is reasonable to look for 
heuristics that can be applied in the context of a local search paradigm: at each 

Elements of SH are written in a simplified notation, omitting the inner braces. For instance, 
the set }^{x}, {x,y}, {x, z}, {x,y, z}} is written as {x,xy,xz, xyz}. 
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step, the next binding for the amgu procedure is chosen by evaluating the effect of 
its abstract execution, considered in isolation, on the precision of the analysis. 

Suppose the number of independent pairs is taken as a measure of precision. Then, 
at each step, for each of the bindings under consideration, the new component s/i', 
as given by Definition 01 must be computed. However, because the computation 
of sh' is the most costly operation to be performed in the computation of the 
amgu operator, a direct application of this heuristic does not appear to be feasible. 
As an alternative, consider a heuristic based on the number of star-unions that 
have to be computed. Star-unions are likely to cause large losses in the number of 
independent pairs that are found. As only non-grounding bindings are considered, 
any binding requiring the computation of a star-union will need the star-union 
even if it is delayed, although a binding that does not require the star-union may 
require it if its computation is postponed: its variables may lose their freeness, 
linearity or independence as a result of evaluating the other bindings. It follows 
that one potential heuristic is: "delay the bindings requiring star-unions as much as 
possible" . In the next example, by adopting this heuristic, the linearity of variable 
y is preserved. 

Consider the application of the bindings x = z and v — w to the following 
abstract description: 

d {{vw, wx, wy, z}, 0, {u, v, x, y}). 

Since x is linear and independent from z, computing amgu(rf,x = z) requires one 
star-union, while two star-unions are needed when computing amgu(d,i; = w) 
because v and w may share. Thus, with the proposed heuristic, a: = z is applied 
before v — w, giving: 

di = amgu((i, x = z) 

= {{vw, wxz, wy}, 0, {u, V, y}), 
di.2 — amgu((ii,ti = w) 

= {{vw, vwxyz, vwxz, vwy}, 0, {u, y}). 

In contrast, if w w is applied first, we have: 

d2 ~ amgu(d, v ~ w) 

= {{vw, vwx, vwxy, vwy, z}, 0, {u, x, y}), 
1^2,1 = amgu((i2,a; = z) 

= {{vw, vwxyz, vwxz, vwy}, 0, {u}). 

Note that the same number of independent pairs is computed in both cases. 

It should be noted that this heuristic, considered in isolation, is not a general 
solution and can actually lead to precision losses. The problem is that, if a binding 
that needs a star-union is delayed, then, when the star-union is computed, it may 
be done on a larger sharing-set, forcing more (independent) pairs of variables into 
the same sharing group. 

Consider the application of the bindings u = x and v = w to the abstract 
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d ({w, uw, V, w, xy, xz}, {u, x}, {u, x}). 

Since x and u are both free variables, no star-union is needed in the computation of 
amgu(d, u = x), while two star-unions are needed when computing amg\i{d, v = w). 

di = amgu((i, u = x) 

— (^{uwxy, uwxz, uxy, uxz, v, w}, {u, x}, {u, x}), 
di 2 — amgu((ii, v — w) 

— (^{uvwxy, uvwxyz, uvwxz, uxy, uxz, vw}, 0, 0). 

Using the other ordering, we have: 

^2 — amgu((i, V = w) 

— ({u, uvw, vw, xy, xz}, {x}, {a;}), 
6^2, 1 = amgu(rf2, u = x) 

— (^{uvwxy, uvwxz, uxy, uxz, vw}, 0, 0). 

Note that in ^2,1 variables y and z are independent, whereas they may share in di,2- 
Thus, in this example, by delaying the only binding that requires the star-unions, 
V — w, the number of known independent pairs is decreased. 

Another possibility is to consider a heuristic that uses the numbers of free and 
linear variables as a measure of precision for local optimization. That is, it chooses 
first those bindings for which these numbers are maximal. However, the last example 
shown above is evidence that even such a proposal may also cause precision losses 
(the binding u — x would be chosen first as it preserves the freeness of variable u). 

In order to evaluate the effects of these two heuristics on real programs, we have 
implemented and compared them with respect to the "straight" abstract compu- 
tation, which considers the non-grounding bindings using the left-to-right order. 
The results reported in Tables 0] and can be summarized as follows: 

1. the precision on the groundness and freeness components is not affected; 

2. the precision on the independent pairs and linearity components is rarely 
affected, in particular when considering goal-dependent analyses; 

3. even for real programs, as was the case for the artificial examples given above, 
the precision can be increased as well as decreased. 

Looking at Tables 0] and [S] it can be seen that the heuristic based on freeness and 
linearity information is slightly better than the use of the straight order, which, in 
its turn, is slightly better than the heuristic based on the number of star-unions. 

Clearly, since these results could not be generalized to other orderings, our inves- 
tigation cannot be considered really conclusive. Besides designing "smarter" heuris- 
tics, it would be interesting to provide a kind of responsiveness test for the underly- 
ing domain with respect to the choice of ordering for the non-grounding bindings: a 



The base domain is Pos X SFL, botli with and without structural information. 
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Goal Independent 


without Struct Info 


with Struct Info 


Prec. class 





I 


G 


F 


L 





I 


G 


F 


L 


< p < 2 


0.9 








0.9 












same precision 


94.6 


95.5 


96.4 


96.4 


95.5 


91.3 


91.3 


93.1 


93.1 


93.1 


unknown 


3.6 


3.6 


3.6 


3.6 


3.6 


6.9 


6.9 


6.9 


6.9 


6.9 


- 2 < p < 


0.9 


0.9 








1.8 


1.8 






























Goal Dependent 


without Struct Info 


with Struct Info 


Prec. class 





I 


G 


F 


L 





I 


G 


F 


L 


same precision 


96.4 


96.4 


96.4 


96.4 


96.4 


90.5 


90.5 


90.5 


90.5 


90.5 


unknown 


3.6 


3.6 


3.6 


3.6 


3.6 


9.5 


9.5 


9.5 


9.5 


9.5 



Time difF. class 


Goal Ind. 


Goal Dep. 




w/o SI 


with SI 


w/o SI 


with SI 


degradation > 1 


4.5 


3.0 


7.2 


4.1 


0.5 < degradation < 1 


0.6 


0.3 






0.2 < degradation < 0.5 


2.4 


0.9 


0.5 


0.5 


0.1 < degradation < 0.2 


1.5 


0.6 


0.5 


0.5 


both timed out 


3.0 


6.3 


3.6 


9.5 


same time 


80.7 


80.7 


85.5 


76.9 


0.1 < improvement < 0.2 


1.5 


1.2 


0.5 


0.5 


0.2 < improvement < 0.5 


1.8 


1.2 


1.4 


2.3 


0.5 < improvement < 1 


0.9 


0.6 




0.9 


improvement > 1 


3.0 


5.1 


0.9 


5.0 



Total time class 


Goal Independent 


Goal Dependent 




without SI 


with SI 


without SI 


with SI 


%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


timed out 


3.3 


3.3 




6.6 


6.6 




3.6 


3.6 




9.5 


9.5 




t > 10 


9.0 


8.1 


-0.9 


8.4 


9.0 


0.6 


7.2 


7.7 


0.5 


8.6 


8.1 


-0.5 


5 < t < 10 


0.3 


0.9 


0.6 


1.5 


1.2 


-0.3 


1.4 


0.9 


-0.5 


1.8 


2.7 


0.9 


l<t<5 


7.5 


7.5 




6.6 


6.3 


-0.3 


3.6 


3.2 


-0.5 


5.0 


4.1 


-0.9 


0.5 <t<l 


2.7 


2.4 


-0.3 


3.3 


3.0 


-0.3 


5.4 


5.9 


0.5 


3.2 


3.6 


0.5 


0.2 <t< 0.5 


8.4 


9.3 


0.9 


10.2 


10.5 


0.3 


13.1 


12.7 


-0.5 


13.6 


13.1 


-0.5 


t < 0.2 


68.7 


68.4 


-0.3 


63.3 


63.3 




65.6 


66.1 


0.5 


58.4 


58.8 


0.5 



Table 4. The heuristic based on the number of star-unions. 



simple test consists in measuring how much the precision can be affected, in either 
way, by the apphcation of an almost arbitrary order. This is the motivation for the 
comparison reported in Table El where the order is from right-to- left, the reverse 
of the usual one. As for the results given in Tables 0] and the number of changes 
to the precision observed in Table El is small and all the observations made above 
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without Struct Info 


with Struct Info 


l^roc. cliiss 





I 


G 


V 


L 





1 




F 


L 


<^ p ^ lU 


0.3 








0.3 


0.3 








0.3 


< p < 2 


0.9 








0.9 


2.7 


2.4 






0.3 


same precision 


94.3 


95.5 


96.4 


96.4 


95.2 


89.5 


90.1 


93.4 


93.4 


92.8 


unknown 


3.6 


3.6 


3.6 


3.6 


3.6 


6.6 


6.6 


6.6 


6.6 


6.6 


-2<p<0 


0.6 


0.6 








0.9 


0.9 








p< -20 


0.3 


0.3 








































Goal Dependent 


without Struct Info 


with Struct Info 


Prec. class 





I 


G 


F 


L 





I 


G 


F 


L 


<p < 2 


0.5 








0.5 












same precision 


94.6 


95.0 


95.5 


95.5 


95.0 


89.6 


89.6 


89.6 


89.6 


89.6 


unknown 


4.5 


4.5 


4.5 


4.5 


4.5 


10.4 


10.4 


10.4 


10.4 


10.4 


-20 <p < -10 


0.5 


0.5 



















Time diff. class 


Goal Ind. 


Goal Dep. 




w/o SI 


with SI 


w/o SI 


with SI 


degradation > 1 


6.9 


4.8 


8.1 


7.7 


0.5 < degradation < 1 


2.1 


1.5 


1.8 


0.5 


0.2 < degradation < 0.5 


2.4 


1.8 


1.8 


2.7 


0.1 < degradation < 0.2 


1.2 


3.3 


2.3 


3.2 


both timed out 


2.4 


5.7 


3.6 


9.0 


same time 


77.4 


73.5 


78.7 


71.9 


0.1 < improvement < 0.2 


1.2 


0.3 






0.2 < improvement < 0.5 


0.6 


1.8 


0.9 


0.9 


0.5 < improvomcut < 1 


0.9 




0.5 




iiuprovciuciit > i 


i.8 


7.2 


2.3 


4.1 



Total time class 


Goal Independent 


Goal Dependent 




witliout 


SI 


with SI 


without Si 


witli SI 




%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


timed out 


3.3 


2.7 


-0.6 


6.6 


5.7 


-0.9 


3.6 


4.5 


0.9 


9.5 


10.0 


0.5 


t > 10 


9.0 


9.6 


0.6 


8.4 


8.7 


0.3 


7.2 


6.8 


-0.5 


8.6 


7.7 


-0.9 


5 < f < 10 


0.3 


2.1 


1.8 


1.5 


1.8 


0.3 


1.4 


1.4 




1.8 


2.7 


0.9 


1< i < 5 


7.5 


6.0 


-1.5 


6.6 


6.9 


0.3 


3.6 


4.5 


0.9 


5.0 


5.0 




0.5 < t < 1 


2.7 


3.0 


0.3 


3.3 


3.9 


0.6 


5.4 


4.1 


-1.4 


3.2 


3.6 


0.5 


0.2 < i < 0.5 


8.4 


9.9 


1.5 


10.2 


13.3 


3.0 


13.1 


13.1 




13.6 


15.4 


1.8 


t < 0.2 


68.7 


66.6 


-2.1 


63.3 


59.6 


-3.6 


65.6 


65.6 




58.4 


55.7 


-2.7 



Table 5. The heuristic based on freeness and hnearity. 
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still hold. Surprisingly, this reversed ordering provides marginally better precision 
results than those obtained using the considered heuristics. 

7 The Reduced Product between Pos and Sharing 

The overlap between the information provided by Pos and the information provided 
by Sharing mentioned in Section ^ means that the Cartesian product Pos x SFL 
contains redundancy, that is, there is more than one element that can characterize 
the same set of concrete computational states. 

In ( [Bagnara et al. 2000| ), two techniques that are able to remove some of this 
redundancy were experimentally evaluated. One of these aims at identifying those 
pairs of variables {x, y) for which the Boolean formula of the Pos component im- 
plies the binary disjunction x V y. In such a case, it is always safe to assume that 
the variables x and y are independent.^® Since the number of independent pairs is 
one of the quantities explicitly measured, this enhancement has the potential for 
"immediate" precision gains. The other technique exploits the knowledge of the 
sets of ground- equivalent variables: the variables in e C VI are ground-equivalent 
in £ Pos if and only if, for each a;,y G e, |= (x <-> y). For a description 
of how these sets can be used to improve sharing analysis, the reader is referred 
to ( [Bagnara et al. 2000| ). The main motivation for experimenting with this spe- 
cific reduction was the ease of its implementation, since all the needed information 
can easily be recovered from the already computed E component of the GER im- 
plementation of Pos (Bagnara and Sc hachte 1999| ). The experimental evaluation 
results given in ( ,Bagnara et al. 2000J for these two techniques show precision im- 
provements with only three of the programs and, also, only with respect to the 
number of independent pairs that were found. Those results just apply to these 
limited forms of reduction, so could not be considered a complete account of all the 
possible precision gains. 

The full reduced product (|Cousot and Cousot 1979|l between Pos and Sharing has 
been elegantly characterized in (|Codish et al. 1999| . where set-sharing a la Jacobs 
and Langen is expressed in terms of elements of the Pos domain itself. Let [(/)] yi 
denote the set of all the models of the Boolean function defined over the set of 
variables VI. Then, the isomorphism maps each set-sharing element sh G SH into 
the Boolean formula (p G Pos such that 

[(f>]yj:^{VI\S\Se sh}U{VI}. 

The sharing information encoded by an element 4>sh) G Pos x Pos can be im- 
proved by replacing the second component (that is, the Boolean formula describing 
set-sharing information) with the conjunction (j)g A (j)sh- The reader is referred to 

It is worth noting that the only precision improvement reported in Tablel6lfor the goal-dependent 
analysis with structural information (caused by the program semi) corresponds to the precision 
decrease reported in Table 1^ This confirms that, as informally discussed in Section [^ such a 
precision decrease was due to the non-commutativity of the amgu operator on Pos X SFL. 
° Note that this observation dates back, at least, to (jCrnogorac et al. 1996} . 
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without Struct Info 


with Struct Info 


l^roc. cliiss 





I 


G 


V 


L 





1 




F 


L 


<^ p ^ lU 


0.3 








0.3 


0.3 








0.3 


< p < 2 


0.9 


0.3 






0.6 


4.2 


3.0 






1.2 


same precision 


94.3 


95.2 


96.4 


96.4 


95.5 


87.7 


89.2 


93.4 


93.4 


91.9 


unknown 


3.6 


3.6 


3.6 


3.6 


3.6 


6.6 


6.6 


6.6 


6.6 


6.6 


-2<p<0 


0.6 


0.6 








1.2 


1.2 








p< -20 


0.3 


0.3 








































Goal Dependent 


without Struct Info 


with Struct Info 


Prec. class 





I 


G 


F 


L 





I 


G 


F 


L 


<p < 2 


0.5 








0.5 


0.5 








0.5 


same precision 


95.5 


95.9 


96.4 


96.4 


95.9 


90.0 


90.5 


90.5 


90.5 


90.0 


unknown 


3.6 


3.6 


3.6 


3.6 


3.6 


9.5 


9.5 


9.5 


9.5 


9.5 


-20 <p < -10 


0.5 


0.5 



















Time diff. class 


Goal Ind. 


Goal Dep. 




w/o SI 


with SI 


w/o SI 


with SI 


degradation > 1 


4.2 


6.0 


4.5 


6.8 


0.5 < degradation < 1 


0.6 


0.6 






0.2 < degradation < 0.5 


2.4 


1.5 


1.4 


0.9 


0.1 < degradation < 0.2 


1.8 


0.9 


0.5 




both timed out 


2.4 


5.7 


3.6 


9.0 


same time 


78.3 


76.2 


82.8 


74.2 


0.1 < improvement < 0.2 


1.5 


1.2 


1.8 


0.9 


0.2 < improvement < 0.5 


1.8 


0.3 


1.4 


1.8 


0.5 < improvomcut < 1 


0.9 


0.9 


0.5 


0.5 


iiuprovciuciit > i 


().() 


G.G 


:!.() 


5.9 



Total time class 


Goal Independent 


Goal Dependent 




witliout 


SI 


with SI 


without Si 


witli SI 




%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


timed out 


3.3 


2.7 


-0.6 


6.6 


5.7 


-0.9 


3.6 


3.6 




9.5 


9.0 


-0.5 


t > 10 


9.0 


8.7 


-0.3 


8.4 


9.9 


1.5 


7.2 


7.7 


0.5 


8.6 


8.1 


-0.5 


5 < f < 10 


0.3 


1.8 


1.5 


1.5 


1.5 




1.4 


0.5 


-0.9 


1.8 


2.7 


0.9 


1< i < 5 


7.5 


6.9 


-0.6 


6.6 


6.0 


-0.6 


3.6 


3.2 


-0.5 


5.0 


4.5 


-0.5 


0.5 < t < 1 


2.7 


2.4 


-0.3 


3.3 


2.7 


-0.6 


5.4 


5.4 




3.2 


3.6 


0.5 


Q.2<t< 0.5 


8.4 


8.7 


0.3 


10.2 


11.1 


0.9 


13.1 


13.1 




13.6 


12.2 


-1.4 


t < 0.2 


68.7 


68.7 




63.3 


63.0 


-0.3 


65.6 


66.5 


0.9 


58.4 


59.7 


1.4 



Table 6. Reversing the ordering of the non- grounding bindings. 
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GoclI Iiid6p6iid6nt 


without Struct Info 


with Struct Info 


Prec. class 





I 


G 


F 


L 





I 


G 


F 


L 


5 < p < 10 












0.3 


0.3 








2 <p < 5 


0.3 


0.3 


















< p < 2 


2.7 


2.7 






0.6 


3.9 


3.9 






0.6 


same precision 


86.1 


86.1 


89.2 


89.2 


88.6 


80.7 


80.7 


84.9 


84.9 


84.3 


unknown 


10.8 


10.8 


10.8 


10.8 


10.8 


15.1 


15.1 


15.1 


15.1 


15.1 
























Goal Dependent 


without Struct Info 


with Struct Info 


Prec. class 





I 


G 


F 


L 





I 


G 


F 


L 


p > 20 


0.5 


0.5 


















10 < p < 20 












0.5 


0.5 








5 <p < 10 












0.5 


0.5 








< p < 2 


2.7 


2.7 








2.7 


2.7 








same precision 


89.1 


89.1 


92.3 


92.3 


92.3 


77.8 


77.8 


81.4 


81.4 


81.4 


unknown 


7.7 


7.7 


7.7 


7.7 


7.7 


18.6 


18.6 


18.6 


18.6 


18.6 



Table 7. Pos x SFL2 versus Pos ® SFL. 

l|Co(iish et al. 1999| for a complete account of this composition and a justification 
of its correctness. 

This specification of the reduced product can be reformulated, using the stan- 
dard set-sharing representation for the second component, to define a reduction 
procedure reduce: Pos x SH SH such that, for all (f>g G Pos, sh £ SH, 

Tednce{(t>g, sh) ^ { S e sh \ {VI \ S) e vi }• 

The enhanced integration of Pos and SFL, based on the above reduction operator, 
is denoted here by Pos (>D SFL. From a formal point of view, this is not the reduced 
product between Pos and SFL: while there is a complete reduction between Pos 
and SH , the same does not necessarily hold for the combination with freeness and 
linearity information. Also note that the domain Pos ® SFL is strictly more precise 
than the domain Sh*^^^, defined in l|Scozzari 2000) for pair-sharing analysis. This is 
because the domain Sh is the reduced product of a strict abstraction of Pos and 
a strict abstraction of SH. 

When using the domain PSD in place of SH, the 'reduce' operator specified 
above can interact in subtle ways with an implementation removing the p-redundant 
sharing groups from the elements of PSD. The following is an example where such 
an interaction provides results that are not correct. 

Let VI = {x,y,z} and sh = {xy,xz,yz,xyz} £ PSD be the current set-sharing 
description. Suppose that the implementation internally represents sh by using the 
/9-reduced element s/ircd = {xy,xz,yz}, so that sh = p{sh^cd)- Suppose also that 
the groundness description computed on the domain Pos is (f>g — {x ^ y ^ z). 
Note that we have [(f>g] vi = {0, {x, y, 2}}. Then we have 

sh' — reduce(s/i, (f>g) = {xyz}; 
sh'j.^^ = reduce ( s/ircd, 0g) = 0- 
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The two Pos-reduced elements sh' and s/irod ^lo^ equivalent, even modulo p. 

Note that the above example does not mean that the reduced product between 
Pas and PSD yields results that are not correct; neither does it mean that it is less 
precise than the reduced product between Pos and SH for the computation of the 
observables. More simply, the optimizations used in our current implementation of 
PSD are not compatible with the above reduction process. Therefore, in TableQwe 
show the precision results obtained when comparing the base domain Pos x SFL2 
with the domain Pos SFL: the implementation of Pos SFL, by avoiding p- 
reductions, is not affected by the correctness problem mentioned above. 

The precision comparison provides empirical evidence that Pos C>5 SFL is more 
effective than the combination considered in | |Bagnara et al. 2000| ). However, as 
indicated by the number of time-outs reported in Table using Pos (g) SFL is not 
feasible due to its intrinsic exponential complexity. We deliberately decided not to 
include the time comparison, since it would have provided no information at all: 
the efficiency degradations, which are largely caused by the lack of p-reductions, 
should not be attributed to the enhanced combination with Pos. In this respect, 
the reader looking for more details is referred to ( |Bagnara et al. 2002| ) . 

For the only purpose of investigating how many precision improvements may 
have been missed in the previous comparison due to the high number of time-outs, 
we have performed another experimental evaluation where we have compared the 
base domain Pos x SFL2 and the domain Pos SFL2- We stress the fact that, 
given the observation made previously, such a precision comparison provides an 
over- estimation for the actual improvements that can be obtained by a correct 
integration of the p-reduction and the 'reduce' operators. A detailed investigation 
of the experimental data, which cannot be reported here for space reasons, has 
shown that the number of precision improvements shown in Table could at most 
double. In particular, improvements are more likely to occur for goal-independent 
analyses. 



8 Ground-or-free Variables 

Most of the ideas investigated in the present work are based on earlier work by other 
authors. In this section, we describe one originally proposed in ( |Bagnara et al. 2000D . 
Consider the analysis of the binding x = t and suppose that, on a set of computation 
paths, this binding is reached with x ground while, on the remaining computation 
paths, the binding is reached with x free. In both cases x will be linear and this is all 
that will be recorded when using the usual combination Pos x SFL. This information 
is valuable since, in the case that x and t are independent, it allows the star-union 
operation for the relevant component for t to be dispensed with. However, the infor- 
mation that is lost, that is, x being either ground or free, is equally valuable, since 
this would allow the avoidance of the star-union of both the relevant components 
for X and even when x and t may share. This loss has the disadvantages that 
CPU time is wasted by performing unnecessary but costly operations and that the 
precision is potentially degraded: not only are the extra star-unions useless for cor- 
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rectness but may introduce redundant sharing groups to the detriment of accuracy. 
It is therefore useful to track the additional mode 'ground-or-free'. 

The analysis domain SFL is extended with the component GF p( VI) con- 
sisting of the set of variables that are known to be either ground or free. As for 
freeness and linearity, the approximation ordering on GF is given by reverse subset 
inclusion. When computing the abstract mgu on the new domain 

SGFL SH X F X GFx L, 

the property of being ground-or-free is used and propagated in almost the same 
way as freeness information. 



Definition 6 

(Improved abstract operations over SGFL.) Let d 



[shJ,gfJ)eSGFL.We 



define the predicate gfree^: Terms Bool such that, for each first order term i, 
where Vt '= vars{t) C VI, 

gfree^it) =^ {reliVt, sh) = 0) V {3x e VI .x = tAxe gf). 

Consider the specification of the abstract operations over SFL given in Definition^ 
The improved operator amgu : SGFL x Bind — > SGFL is given by 

amgu(d, x = t) =^ {sh\ /', gf\ I'), 

where /' and I" are defined as in Definition ^ and 

sh' ^m\iV,t, sh) Uhm{S,, St); 



Sx — 



St = 



gf 



at 



if gjree^{x) V gfree^{t) V (lind{t) Mndd[x,t)); 
otherwise; 

if gfree^ix) V gfree^it) V {lind{x) A indd{x,t)); 
otherwise; 

{VI\varsish'))Ugf"; 



gf, 

gf \ vars{Rx), 
gf \ vars{Rt), 
gf \ vars{Rx U Rt), 

gf^i". 



if gfreedix) A gfreed{t); 
if gfreedix); 
if gfreedit); 
otherwise; 



The computation of the set gf" is very similar to the computation of the set 
/' as given in Definition 01 The new ground-or-free component gf' is obtained by 
adding to gf" the set of all the ground variables: in other words, if a variable 
"loses freeness" then it also loses its ground-or-free status unless it is known to 
be definitely ground. It can be noted that, in the computation of this improved 
amgu, the ground-or-free property takes the role previously played by freeness. In 
particular, when computing sh' , all the tests for freeness have been replaced by 
tests on the newly defined Boolean function gfreCd', similarly, in the computation 
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of the new linearity component I' , the set /' has been replaced by gf' (since any 
ground-or-free variable is also linear) . It is also easy to generalize the improvement 
for definitely cyclic bindings introduced in Definition |5l to the domain SGFL: as 
before, the test free^{x) needs to be replaced with the new test gfreej^(x). 

To summarize, the incorporation of the set of ground-or-free variables is cheap, 
both in terms of computational complexity and in terms of code to be written. 
As far as computational complexity is concerned this extension looks particularly 
promising, since the possibility of avoiding star-unions has the potential of absorbing 
its overhead if not of giving rise to a speed-up. 

Thus the domain Pos x SGFL was experimentally evaluated on our benchmark 
suite, with and without the structural information provided by Pattern(-), both in 
a goal-dependent and in a goal-independent way, and the results compared with 
those previously obtained for the domain Pos x SFL. Note that the implementation 

dcf 

uses the non-redundant version SGFL2 = PSD x F x GF x i. In the precision 
comparisons of Table|Hl the new column labeled GF reports precision improvements 
measured on the ground-or-free property itself. 

As far as the timings are concerned, the experimentation fully confirms our quali- 
tative reasoning: efficiency improvements are more frequent than degradations and, 
even with widening operators switched off, the distributions of the total analysis 
times show minor changes only. As for precision, disregarding the many improve- 
ments in the GF columns, few changes can be observed, and almost all of these 
concern just the linearity information.^*^ 

The results in Table |H1 show that tracking ground-or-free variables, while being 
potentially useful for improving the precision of a sharing analysis, rarely reaches 
such a goal. In contrast, the precision gains on the ground-or-free property itself are 
remarkable, affecting from 39% to 74% of the programs in the benchmark suite. It 
is possible to foresee several direct applications for this information that, together 
with the just mentioned negligible computational cost, fully justify the inclusion of 
this enhancement in a static analyzer. In particular, there are at least two ways in 
which a knowledge of ground-or-free variables could improve the concrete unification 
procedure. 

The first case applies in the context of occurs-check reduction ( |S0ndergaard 1986| 
ICrnogorac et al. 1996| ), that is when a program designed for a logic programming 
system performing the occurs-check is to be run on top of a system omitting this 
test. In order to ensure correct execution, all the explicit and implicit unifications in 
the program are treated as if the ISO Prolog built-in unif y_with_occurs_check/2 
was used to perform them. In order to minimize the performance overhead, it is 
important to detect, as precisely as possible and at compile-time, those NSTO 
(short for Not Subject To the Occurs-check UDeransart et al. 1991l|ISO/IEC 1995| )) 
unifications where the occurs-check will not be needed. For these unifications, =/2 



For this comparison, in the analysis using Pos X SFL, the number of ground-or-free variables is 
computed by summing the number of ground variables with the number of free variables. 
In fact the sole improvement to the number of independent pairs is due to a synthetic benchmark, 
named gof , that was explicitly written to show that variable independence could be affected. 
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Goal Ind. 


without Struct Info 


with Struct Info 


Pix'c. class 


() 


1 




F 


(tF 


F 





I 


c; 


F 


(tF 


F 


p > 20 


52.7 


0.3 






52.7 




48.5 


0.3 






48.5 




10 < p < 20 


11.7 








11.7 




16.0 








16.0 




b < p< 10 


•l.l 








5.1 




7.5 








7.5 




2 <p< 5 


2.4 








2.4 




1.8 








1.8 




<p < 2 


0.3 








0.3 


1.5 


0.6 








0.6 


1.5 


same precision 


24.1 


96.4 


96.7 


96.7 


24.1 


95.2 


19.0 


93.1 


93.4 


93.4 


19.0 


91.9 


unknown 


3.3 


3.3 


3.3 


3.3 


3.3 


3.3 


6.6 


6.6 


6.6 


6.6 


6.6 


6.6 




























Goal Dep. 


without Struct Info 


with Struct Info 


Prec. class 


O 


I 


G 


F 


GF 


L 


O 


I 


G 


F 


GF 


L 


p > 20 


5.9 








5.9 




5.9 








5.9 




10 < p < 20 


4.5 








4.5 




5.4 








5.4 




5 < p < 10 


7.7 


0.5 






7.7 




5.4 


0.5 






5.4 




2 <p < 5 


13.1 








13.1 




12.2 








12.2 




<p < 2 


8.1 








8.1 


0.5 


10.0 








10.0 




same precision 


57.0 


95.9 


96.4 


96.4 


57.0 


95.9 


51.6 


90.0 


90.5 


90.5 


51.6 


90.5 


unknown 


3.6 


3.6 


3.6 


3.6 


3.6 


3.6 


9.5 


9.5 


9.5 


9.5 


9.5 


9.5 



Time diff. class 


C4(jal lud. 


Goal Dep. 




w/o SI 


with SI 


w/o SI 


with SI 


dc^gradation > 1 




0.6 




0.9 


()..") < degradation < 1 


0.:i 




[).:> 




0.2 < degradation < 0.5 




0.6 


0.5 


1.4 


0.1 < degradation < 0.2 


0.3 






0.5 


both timed out 




().() 


:!.() 


9.5 


same time 


88.6 


85.2 


87.3 


82.8 


0.1 < improvement < 0.2 


1.2 


1.2 


1.8 


1.4 


0.2 < improvement < 0.5 


2.4 


2.4 


1.8 


0.9 


0.5 < improvement < 1 


2.1 


0.9 


2.3 


0.9 


improvement > 1 


1.8 


2.4 


2.3 


1.8 



Total time class 


Goal Independent 


Goal Dependent 




without SI 


with SI 


without SI 


with SI 


%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


timed out 


3.3 


3.3 




6.6 


6.6 




3.6 


3.6 




9.5 


9.5 




t > 10 


9.0 


9.0 




8.4 


8.4 




7.2 


7.2 




8.6 


8.6 




5 < i < 10 


0.3 


0.3 




1.5 


1.5 




1.4 


1.4 




1.8 


1.8 




1< i < 5 


7.5 


7.5 




6.6 


6.6 




3.6 


3.6 




5.0 


5.0 




0.5 <f<l 


2.7 


2.7 




3.3 


3.6 


0.3 


5.4 


5.9 


0.5 


3.2 


3.2 




0.2 < f < 0.5 


8.1 


8.7 


0.:i 


10.2 


10.5 


0.3 


13.1 


12.7 


-0.5 


13.(i 


1 1.0 


0.5 


t < 0.2 


68.7 


68.4 


-0.3 


63.3 


62.7 


-0.6 


65.6 


65.6 




58.4 


57.9 


-0.5 



Table 8. Pos x SFL2 versus Pos x SGFL2. 
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can safely be used; for the remaining ones, the program will have to be transformed 
so that unif y_with_occurs_check/2 is explicitly called to perform them. Ground- 
or-freeness can be of help for this application, since a unification between two 
ground-or-free variables is NSTO. Note that this is an improvement with respect 
to the technique used in ( |Crnogorac et al. 1996| ), since it is not required that the 
two considered variables are independent. 

As a second application, ground-or-freeness can be useful to replace the full con- 
crete unification procedure by a simplified version. Since a ground-or-free term is 
either ground or free, a single run-time test for freeness will discriminate between 
the two cases: if this test succeeds, unification can be implemented by a single as- 
signment; if the test fails, any specialized code for unification with a ground term 
can be safely invoked. In particular, when unifying two ground-or-free variables 
that are not free at run-time, the full unification procedure can be replaced by a 
simpler recursive test for equivalence. 



9 More Precise Exploitation of Linearity 

In ( |King 19941 ), A. King proposes a domain for sharing analysis that performs 
a quite precise tracking of linearity. Roughly speaking, each sharing group in a 
sharing-set carries its own linearity information. In contrast, in the approach of 
ULangen 1990| ) , which is the one usually followed, a set of definitely linear variables 
is recorded along with each sharing-set. The proposal in ( |King 19941 ) gives rise to 
a domain that is quite different from the ones presented here. Since ( |King 199^ 
does not provide an experimental evaluation and we are unaware of any subsequent 
work on the subject, the question whether this more precise tracking of linearity is 
actually worthwhile (both in terms of precision and efficiency) seems open. 

What interests us here is that part of the theoretical work presented in (King 1994| ) 
may be usefully applied even in the more classical treatments of linearity such as 
the one being used in this paper. As far as we can tell, this fact was first noted 
in HBagnara et al. 2000| ). 

In ( King 1994| ), point 3 of Lemma 5 (which is reported to be proven in | |King 19'93| )) 
states that, if s is a linear term independent from a term t, then in the unifier for 
s = t any sharing between the variables in s is necessarily caused by those variables 
that can occur more than once in t. 

This result can be exploited even when using the domain SFL. Given the abstract 
element d = {sh, /, I), let x E {I \ f) he a. non-free but linear variable and let t be 
a non- linear term such that indd{x, t). Let also Vx, Vt, Vxt, Rx and Rt be as given 
in Definition^ In such a situation, when abstractly evaluating the binding x = t, 
the standard amgu operator gives the set-sharing component 

sh' = rel(14t, sh) U bin(i?*, Rt). 

Suppose the set Vt is partitioned into the two components Vj' and V/'^ where V^"' 
is the set of the "problematic" variables, that is, those variables that potentially 
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Goal Iiiclcpcndcnt 


Goal Dcpcnclcnt 


Pri-c. class 


O 


I 


G 


F 


L 





I 


c; 


F 


L 


p > 20 


0.3 


0.3 


















2 <p < 5 












0.5 


0.5 








same precision 


93.1 


93.1 


93.4 


93.4 


93.4 


90.0 


90.0 


90.5 


90.5 


90.5 


unknown 


6.6 


6.6 


6.6 


6.6 


6.6 


9.5 


9.5 


9.5 


9.5 


9.5 



Time difference class 


% benchmarks 




Goal Ind. 


Goal Dep. 


degradation > 1 


0.3 




0.5 < degradation < 1 






0.2 < degradation < 0.5 






0.1 < degradation < 0.2 


0.3 


0.5 


both timed out 


6.6 


9.5 


same time 


85.2 


83.7 


0.1 < improvement < 0.2 


0.9 


1.8 


0.2 < improvement < 0.5 


2.4 


0.5 


0.5 < improvement < 1 


0.6 


2.7 


improvement > 1 


3.6 


1.4 



Total time class 


Goal Ind. 


Goal Dep. 




%1 


%2 


A 


%1 


%2 


A 


timed out 


6.6 


6.6 




9.5 


9.5 




t > 10 


8.4 


8.4 




8.6 


8.6 




5 < t < 10 


1.5 


1.5 




1.8 


1.8 




1< t < 5 


6.6 


6.6 




5.0 


5.0 




0.5 < t < 1 


3.3 


3.3 




3.2 


3.2 




0.2 <t< 0.5 


10.2 


11.1 


0.9 


13.6 


14.0 


0.5 


t < 0.2 


63.3 


62.3 


-0.9 


58.4 


57.9 


-0.5 



Table 9. The effect of enlianced linearity on Pattern(Po5 x SFL2). 



make t a non-linear term. Formally, 



rel(V;i,s/i) and Rf 



y S varsit) 



y e I 

y € mvars{t) =^ y ^ vars{sh) 
Vz e vars{t) : (y = zV indd{y, z)) 



Let Rl 



rel(Vt"\ sh). Note that Rf ^ 0, because t IS a non- 
linear term. If also R\^ then the standard amgu can be replaced by an improved 
version (denoted by amgu^^) computing the following set-sharing component: 



sh'k = rel(V;t, sh) U hm{R^, R\) U hm{R*, Rf 
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As a consequence of King's result ( |King 19941 Lemma 5), only Rf^ (the relevant 
component of sh with respect to the problematic variables V^^^) has to be combined 
with R* while R\ can be combined with just Rx (without the star-miion). 

For a working example, suppose VI — {v,w,x,y, z} is the set of variables of 
interest and consider the SFL element 

d '^^^ {{vx, wx, y, z}, {v, w, y}, {v, w, x, y}) 

with the binding x ~ f{y,z). Note that all the applicability conditions specified 
above are met: in particular t — f{y, z) is not linear because z ^ I. As Rx = {vx, wx} 
and Rf = {y, z}, a standard analysis would compute 

d' = amgu(d,a; = f{y,z)) 

= (^{vwxy, vwxz, vxy, vxz, wxy, wxz}, 0, {y}). 

On the other hand, since = {y} and Vj"' = {z}, the enhanced analysis would 
compute 

4 = amgUfc(rf,x = f{y,z)) 

— (^{vwxz, vxy, vxz, wxy, wxz}, 0, {y}). 

Note that dj. does not include the sharing group vwxy. This means that, if in the 
sequel of the computation variable z is bound to a ground term, then variables 
V and w will be known to be definitely independent. This independence is not 
captured when using the standard amgu since d' includes the sharing group vwxy, 
and therefore the variables v and w will potentially share even after grounding z. 

The experimental evaluation for this enhancement is reported in Table The 
comparison of times shows that the efficiency of the analysis, when affected, is 
more likely to be improved than degraded. As for the precision, improvements are 
observed for only two programs; moreover, these are synthetic benchmarks such as 
the above example. Nevertheless, despite its limited practical relevance, this result 
demonstrates that the standard combination of Sharing with linearity information 
is not optimal, even when all the possible orderings of the non-grounding bindings 
are tried. 

10 Sharing and Preeness 

As noted by several authors ( |Bruynooghe et al. 1994a|lfjueno et al. 1994l|Ca beza an d Hermenegildo 1994| ), 
the standard combination of Sharing and Free is not optimal. G. File ifFUe 1994) 
formally identified the reduced product of these domains and proposed an improved 
abstract unification operator. This new operator exploits two properties that hold 
for the most precise abstract description of a single concrete substitution: 

1. each free variable occurs in exactly one sharing group; 

2. two free variables occur in the same sharing group if and only if they are 
aliases (i.e., they have become the same variable). 
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When considering the general case, where sets of concrete substitutions come into 
play, property 1 can be used to (partially) recover disjunctive information. In par- 
ticular, it is possible to decompose an abstract description into a set of (maximal) 
descriptions that necessarily come from different computation paths, each one sat- 
isfying property 1. The abstract unification procedure can thus be computed sepa- 
rately on each component, and the results of each subcomputation are then joined 
to give the final description. As such components are more precise than the original 
description (they possibly contain more ground variables and less sharing pairs), 
precision gains can be obtained. 

Furthermore, by exploiting property 2 on each component, it is possible to cor- 
rectly infer that for some of them the computation will fail due to a functor clash 
(or to the occurs-check, if considering a system working on finite trees). Note that a 
similar improvement is possible even without decomposing the abstract description. 
As an example, consider an abstract element such as the following: 

d = {{xy, u, v}, {x, y}, {x, y}). 

Since the sharing group xy is the only one where the free variables x and y occur, 
property 2 states that x and y are indeed the same variable in all the concrete 
computation states described by d G SFL. Therefore, when abstractly evaluating 
the substitution {a; = f{u),y = g{v)}, it can be safely concluded that its concrete 
counterparts will result in failure due to the functor clash. In the same circum- 
stances, it can also be concluded that a concrete substitution corresponding to, say, 
{x = f{y)} will cause a failure of the occurs-check, if this is performed. 

As was the case for the reduced product between Pos and SH (see Sectiondl, the 
interaction between the enhanced abstract unification operator and the elimination 
of p-redundant elements can lead to results that are not correct. 

To see this, let VI ~ {w,x,y,z} and consider the set of concrete substitutions 
S = p{cr), where cr = {x ^—>- v,y ^—^ v, z v} (note that v ^ VI). The abstract 
element describing E is = (s/i, /, I) G SFL, where sh — {w, x, xy, xyz, xz, y, yz, z} 
and f = I = VI. Suppose that the implementation represents d by using the reduced 
element 4-cd — {sh-cad, /, Oj where s/ired = sh \ {xyz}, so that sh = p(s/irod)- 

According to the specification of the enhanced operator, d^ed can be decomposed 
into the following four components: 

ci = {{w, X, y, z}, f, I), C3 = {{w, xz, y}, /, I), 

C2 = {{w, X, yz}, /, I), C4 = {{w, xy, z}, f, l). 

Consider the binding x — f{y, w) and, for each i G {1, . . . , 4}, the computation 
of c- = {sh[,f[,l[) = amgu(ci,x = f{y,w)), where we have I'l — I2 = I's = ^1 and 
?4 = {w, z}. In all four cases, we have z e l[, so that z keeps its linearity even after 
merging the results of the four subcomputations into a single abstract description. 

In contrast, when performing the same computation with the original abstract 
description d in the decomposition phase, we also obtain a fifth component. 



C5 = {{w,xyz},f,l). 
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Goal Indcpciiclcnt 
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o.;i 






0.6 
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same precision 
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87.0 
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unknown 


4.2 


4.2 
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4.2 
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9.9 


9.9 


9.9 


9.9 
























Goal Dependent 


without Struct Info 


with Struct Info 


Prcc. class 


O 


I 


G 


F 


L 
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same precision 


96.4 


96.1 


9(i.l 


1 


9G.4 


89.6 


89.6 


89.(3 


89.0 


89.6 
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3.6 


3.6 


3.6 


3.6 


3.6 


10.4 


10.4 


10.4 


10.4 


10.4 
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(Toal Dep. 




w/o SI 
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w/o SI 


with SI 


degradation > 1 


9.6 


13.6 


3.2 


5.9 


0.5 < degradation < 1 


0.6 


1.8 


1.4 


1.4 


0.2 < degradation < 0.5 


3.3 


2.4 


1.8 


3.6 


0.1 < degradation < 0.2 


0.6 


1.5 


2.3 


1.4 


both timed out 


3.3 


6.6 


3.6 


9.5 


same time 


82.2 


73.5 


87.8 


77.8 


0.1 < improvement < 0.2 










0.2 < improvement < 0.5 


0.3 








0.5 < improvement < 1 










improvement > 1 




0.6 




0.5 



Total time class 
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Goal Dependent 




without SI 


with SI 
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with SI 


%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


%1 


%2 


A 


timed out 


3.3 


4.2 


0.9 


6.6 


9.9 


3.3 


3.6 


3.6 




9.5 


10.4 


0.9 


t > 10 


9.0 


9.6 


0.6 


8.4 


8.4 




7.2 


7.2 




8.6 


8.1 


-0.5 


5 < t < 10 


0.3 


0.9 


0.6 


1.5 


1.2 


-0.3 


1.4 


1.4 




1.8 


1.8 




l<t<5 


7.5 


6.9 


-0.6 


6.6 


5.7 


-0.9 


3.6 


3.6 




5.0 


4.5 


-0.5 


0.5 < i < 1 


2.7 


2.1 


-0.6 


3.3 


4.5 


1.2 


5.4 


5.9 


0.5 


3.2 


3.2 




0.2 <t< 0.5 


8.4 


8.4 




10.2 


12.0 


1.8 


13.1 


12.7 


-0.5 


13.6 


14.9 


1.4 


t < 0.2 


68.7 


67.8 


-0.9 


63.3 


58.1 


-5.1 


65.6 


65.6 




58.4 


57.0 


-1.4 



Table 10. The effect of enhanced freeness on Pos x SFLa. 



When computing = (s/ig, /g, Z5) = amgu(c5,.-E = f{y,w)), wc obtain l'^ = {w}, 
so that z loses its linearity when merging the five results into a single abstract 
description. Note that this is not an avoidable precision loss, since in the concrete 
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computation path corresponding to the substitution cr we would have computed 

a' ^ {xt-^ f{x,w),y i-> f{y,w),z ^ f{z,w)}, 

where z is bound to a non-hnear term (namely, an infinite rational term with an 
infinite number of occurrences of variable w). Therefore, the result obtained when 
using the abstract description cted is not correct. 

As already observed in Section |7| the above correctness problem lies not in the 
SFL2 domain itself, but rather in our optimized implementation, which removes 
the p-redundant elements from the set-sharing description. 

We implemented the first idea by File (i.e., the exploitation of property 1) on 
the usual base domain Pos x SFL2. As noted above, this implementation may yield 
results that are not correct: the precision comparison reported in Table [TUI provides 
an over-estimation of the actual improvements that could be obtained by a correct 
implementation. However, it is not possible to assess the magnitude of this over- 
estimation, since our implementation of this enhancement on the domain Pos x SFL, 
where no p-redundancy elimination is performed, times-out on a large fraction of 
the benchmarks. The results in Table [TUI show that precision improvements are only 
observed for goal-independent analysis. When looking at the time comparisons, 
it should be observed that the analysis of several programs had to be stopped 
because of the combinatorial explosion in the decomposition, even though we used 
the domain Pos x SFL2. Among the proposals experimentally evaluated in this 
paper, this one shows the worst trade-off between cost and precision. 

Note that, in principle, such an approach to the recovery of disjunctive informa- 
tion can be pursued beyond the integration of sharing with freeness. In fact, by 
exploiting the ground-or-free information as in Section |H1 it is possible to obtain 
decompositions where each component contains at most one occurrence (in contrast 
with the exactly one occurrence of File's idea) of each ground-or-free variable. In 
each component, the ground-or-free variable could then be "promoted" as either a 
ground variable (if it does not occur in the sharing groups of that component) or 
as a free variable (if it occurs in exactly one sharing group). 

It would be interesting to experiment with the second idea of File. However, 
such a goal would require a big implementation effort, since at present there is no 
easy way to incorporate this enhancement into the modular design of the China 
analyzer. 

11 Tracking Compoundness 

In | |Bruynooghe et al. 1994a|[Bruynooghe et al. 1994b| ), Bruynooghe and colleagues 
considered the combination of the standard set-sharing, freeness, and linearity do- 
mains with compoundness information. As for freeness and linearity, compoundness 

Roughly speaking, the SFL component should be able to produce some new (implicit) structural 
information and notify it to the enclosing Pattern(-) component, which would then need to 
combine this information with the (explicit) structural information already available. However, 
in order to be able to receive notifications from its parameter, the Pattern( ) component, which 
is implemented as a C++ template, would have to be heavily modified. 
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was represented by the set of variables that definitely have the corresponding prop- 
erty. 

As discussed in ( |Bruynooghe et al. 1994a||Bruynooghe et al. 1994b| ), compound- 
ness information is useful in its own right for clause indexing. Here though, the 
focus is on improving sharing information, so that the question to be answered is: 
can the tracking of compoundness improve the sharing analysis itself? This question 
is also considered in JBruynooghe et al. 1994a| |Bruynooghe et al. 1994b| ) where a 
technique is proposed that exploits the combination of sharing, freeness and com- 
poundness. This technique relies on the presence of the occurs-check. 

Informally, consider the binding x = t together with an abstract description 
where x is a, free variable, t is a compound term and x definitely shares with t. 
Since x is free, x is aliased to one of the variables occurring in t. As a consequence, 
the execution of the binding x = t will fail due to the occurs-check. In a more 
general case, when only possible sharing information is available, the precision of 
the abstract description can be safely improved by removing, just before computing 
the abstract binding, all the sharing groups containing both x and a variable in t. 
In addition, if this reduction step removes all the sharing groups containing a free 
variable, then it can be safely concluded that the computation will fail. 

To see how this works in practice, consider the binding x — f(y, z) and the 
description di (s/ii, /i, h) G SFL such that 

ski {wx, xy, xz, y, z}, 

r dcf 

/i = {x}, 

, dcf 1 

h = {w,x,y,z\. 

Since x is free and f{y, z) is compound, the sharing-groups xy and xz can be 
removed so that the amgu computation will give the set-sharing and linearity com- 
ponents 

sh'i {wxy^wxz}, 

dcf f , 

h = {w,x,y,z} 

instead of the less precise 

sh'i '= {wxy, wxz, xy, xyz, xz}, 

,/ dcf r ^ 

k = {w}- 

Note that the precision improvement of this particular example could also be ob- 
tained by applying, in its full generality, the second technique proposed by File and 
sketched in the previous section. This is because the term with which x is unified is 
"explicitly" compound. However, if the term t was "implicitly" compound (i.e., if it 
was an abstract variable known to represent compound terms) then the technique 
by File would not be applicable. For example, consider the binding x = y and the 
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description d2 '= {sh2, f2, h) S SFL such that 

sh2 {wx,xyz,y}, 

h = {4, 

h =^ {w, X, y, z} 

supplemented by a compoundness component ensuring that y is compound. Then 
the sharing-group xyz can be removed so that the amgu will compute 

sh'2 {wxy}, 

def r ^ 

«2 = \w,x,y,z) 



instead of 



S/12 =^ {tuxy,wxyz,xyz}, 

,/ def r T 



To see how a knowledge of the compoundness can be used to identify definite failure, 

consider t' 
such that 



def 

consider the unification x = f{y,z) and the description ds = {sh3,f3,ls) G SFL 



s/i3 =^ {wxy, wxz, X, y, z}, 
h {w,x}, 
h =^ {w,x,y,z}. 

As in the examples above, variable x is free and term t =^ /(?/, z) is compound so 
that, by applying the reduction step, wc can remove the sharing groups wxy and 
wxz. However, this has removed all the sharing groups containing the free variable 
w, resulting in an inconsistent computation state. 

We did not implement this technique, since it is only sound for tlic^ analysis 
of systems performing the occurs-check, whereas we are targeting at the analysis 
of systems possibly omitting it. Nonetheless, an experimental evaluation would 
be interesting for assessing how much this precision improvement can affect the 
accuracy of applications such as occurs-check reduction. 



12 Conclusion 

In this paper we have investigated eight enhanced sharing analysis techniques that, 
at least in principle, have the potential for improving the precision of the sharing 
information over and above that obtainable using the classical combination of set- 
sharing with freeness and linearity information. These techniques either make a 
better use of the already available sharing information, by defining more powerful 
abstract semantic operators, or combine this sharing information with that captured 
by other domains. Our work has been systematic since, to the best of our knowledge, 
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we have considered all the proposals that have appeared in the literature: that is, 
better exploitation of groundness, freeness, linearity, compoundness, and structural 
information. 

Using the China analyzer, seven of the eight enhancements have been exper- 
imentally evaluated. Because of the availability of a very large benchmark suite, 
including several programs of respectable size, the precision results are as conclu- 
sive as possible and provide an almost complete account of what is to be expected 
when analyzing any real program using these domains. 

The results demonstrate that good precision improvements can be obtained with 
the inclusion of explicit structural information. For the groundness domain Pos, 
several good reasons have been given as to why it should be combined with set- 
sharing. As for the remaining proposals, it is hard to justify them as far as the 
precision of the analysis is concerned. 

Regarding the efficiency of the analysis, it has been explained why the reported 
time comparisons can be considered as upper bounds to the additional cost re- 
quired by the inclusion of each technique. Moreover, it has been argued that, from 
this point of view, the addition of a 'ground-or-free' mode and the more precise 
exploitation of linearity are both interesting: they are not likely to affect the cost 
of the analysis and, when this is the case, they usually give rise to speed-ups. 

No further positive indications can be derived from the precision and time com- 
parisons of the remaining techniques. In particular, it has not been possible to 
identify a good heuristic for the reordering of the non-grounding bindings. The ex- 
perimentation suggests that sensible precision improvements cannot be expected 
from this technique. When considering these negative results, the reader should be 
aware that the precision gains arc measured with respect to an analysis tool built on 
the base domain Pos x SFL which, to our knowledge, is the most accurate sharing 
analysis tool ever implemented. 

The experimentation reported in this paper resulted in both positive and negative 
indications. We believe that all of these will provide the right focus in the design 
and development of useful tools for sharing analysis. 
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